metrics: add workload-level preemption count gauge#11252
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gyliu513 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
cc @amy ptal |
|
|
||
| // +metricsdoc:group=workload | ||
| // +metricsdoc:labels=namespace="the namespace of the preempted workload",workload="the name of the preempted workload",cluster_queue="the ClusterQueue of the preempted workload",reason="eviction or preemption reason" | ||
| WorkloadPreemptionsCount *prometheus.GaugeVec |
There was a problem hiding this comment.
If you want to model similar to PreemptedWorkloadsTotal, please rename count to total.
| }, append([]string{"preempting_cluster_queue", "reason", "replica_role"}, extraLabels...), | ||
| ) | ||
|
|
||
| WorkloadPreemptionsCount = prometheus.NewGaugeVec( |
There was a problem hiding this comment.
The Help text says: "Uses a Gauge (rather than a Counter) so that the time series can be deleted when the workload finishes or is removed, keeping cardinality bounded."That rationale is incorrect. CounterVec supports DeletePartialMatch exactly the same way. In fact, ClearClusterQueueMetrics at pkg/metrics/metrics.go:1038 already deletes PreemptedWorkloadsTotal (a CounterVec) via DeletePartialMatch.
Using a Gauge for a value that only ever increments is a Prometheus anti-pattern: it breaks rate()/increase() and the reset-detection semantics those functions rely on. Operators who try to chart preemption rate per workload will get wrong answers.
Please switch to CounterVec and update the help text. The variable should also be renamed accordingly (WorkloadPreemptionsTotal, with metric name workload_preemptions_total), matching both Prometheus naming convention and PreemptedWorkloadsTotal for ClusterQueue.
| Uses a Gauge (rather than a Counter) so that the time series can be deleted when the workload finishes or is removed, keeping cardinality bounded. | ||
| This gauge is deleted when the workload is completed or deleted.`, | ||
| }, []string{"namespace", "workload", "cluster_queue", "reason"}, | ||
| ) |
There was a problem hiding this comment.
Please mirror PreemptedWorkloadsTotal. Add replica_role and extraLabels.
| EvictedWorkloadsTotal, | ||
| EvictedWorkloadsOnceTotal, | ||
| PreemptedWorkloadsTotal, | ||
| WorkloadPreemptionsCount, |
There was a problem hiding this comment.
As an example, per-LocalQueue metrics are gated by LocalQueueMetrics: pkg/metrics/metrics.go:1344, RegisterLQMetrics().
Per workload has an even higher cardinality. Please add a featuregate for per workload metrics.
|
|
||
| finishedCond := apimeta.FindStatusCondition(wl.Status.Conditions, kueue.WorkloadFinished) | ||
| if finishedCond != nil && finishedCond.Status == metav1.ConditionTrue { | ||
| metrics.ClearWorkloadPreemptionMetrics(wl.Namespace, wl.Name) |
There was a problem hiding this comment.
When workloadRetention.afterFinished is configured, the workload object is retained for some duration after it finishes, but this call wipes its preemption history immediately on the first finished reconcile. An operator inspecting a retained finished workload (which is the whole point of the retention feature) will see zero preemptions even if it was preempted many times.
Move the metrics.ClearWorkloadPreemptionMetrics(...) call into the branch where we actually delete the workload (line 268, just before/after r.client.Delete), and rely on the Delete event handler at line 1141 for the no-retention path. That way the metric lifetime matches the object lifetime.
What type of PR is this?
/kind bug
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #11171
Special notes for your reviewer:
Does this PR introduce a user-facing change?